AITopics | optimizing subnetwork adaptively

Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively

Neural Information Processing SystemsDec-24-2025, 16:43:28 GMT

Large-scale pre-trained language models have achieved impressive results on a wide range of downstream tasks recently. However, fine-tuning an extremely large-scale pre-trained language model on limited target datasets is often plagued by overfitting and representation degradation. In this paper, we propose a Dynamic Parameter Selection (DPS) algorithm for the large-scale pre-trained models during fine-tuning, which adaptively selects a more promising subnetwork to perform staging updates based on gradients of back-propagation. Experiments on the GLUE benchmark show that DPS outperforms previous fine-tuning methods in terms of overall performance and stability, and consistently achieves better results with variable pre-trained language models. In addition, DPS brings a large magnitude of improvement in out-of-domain transferring experiments and low-resource scenarios, which shows that it can maintain stable general contextual features and reduce the representation collapse.

name change, optimizing subnetwork adaptively, pre-trained language model, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Appendix for " Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively "

Neural Information Processing SystemsAug-16-2025, 15:45:25 GMT

In Sec.3.3, we have experimentally verified that DPS outperforms various fine-tuning methods. Table 1: Eight datasets used in this paper form GLUE benchmark. In this paper, we investigate the performance of DPS on five distinctive and widely used large-scale pre-trained language models, namely BERT Devlin et al. [2018], RoBERTa Liu et al. [2019], DeBERTa improves Transforme-based pre-trained model with disentangled attention mechanism and enhanced mask decoder. We use mixed precision training to speed up the experimental process. This method is applied by ELECTRA when fine-tuning downstream tasks. 2 D Appendix D. Experimental Details for Different Fine-tuning Methods The following is our hyperparameter search space for different fine-tuning regularization methods: Mixout We grid search Mixout probability p {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8}.

arxiv preprint arxiv, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.70)

Add feedback

Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively

Neural Information Processing SystemsJan-17-2025, 03:33:49 GMT

Large-scale pre-trained language models have achieved impressive results on a wide range of downstream tasks recently. However, fine-tuning an extremely large-scale pre-trained language model on limited target datasets is often plagued by overfitting and representation degradation. In this paper, we propose a Dynamic Parameter Selection (DPS) algorithm for the large-scale pre-trained models during fine-tuning, which adaptively selects a more promising subnetwork to perform staging updates based on gradients of back-propagation. Experiments on the GLUE benchmark show that DPS outperforms previous fine-tuning methods in terms of overall performance and stability, and consistently achieves better results with variable pre-trained language models. In addition, DPS brings a large magnitude of improvement in out-of-domain transferring experiments and low-resource scenarios, which shows that it can maintain stable general contextual features and reduce the representation collapse.

fine-tuning, optimizing subnetwork adaptively, pre-trained language model, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Collaborating Authors

optimizing subnetwork adaptively

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively

Appendix for " Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively "

Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively